This project attempts to provide insights into EV adoption patterns using approaches from spatial data science.
Seattle-Tacoma-Bellevue, WA Portland-Vancouver-Hillsboro, OR-WA Spokane-Spokane Valley, WA
Specifically, we want to understand how urbanicity plays in shaping the observed landscape of EV population in state of Washington.
Literature Review
Methodology
Set up
packages
loading files
Reading layer `ev1203' from data source
`/Users/toyuan/24-manuscript/data/ev1203.gpkg' using driver `GPKG'
Simple feature collection with 208002 features and 25 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -124.6274 ymin: 45.596 xmax: -117.0455 ymax: 48.99205
Geodetic CRS: WGS 84
Reading layer `wa_poly' from data source
`/Users/toyuan/24-manuscript/data/wa_poly.gpkg' using driver `GPKG'
Simple feature collection with 1 feature and 14 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -124.849 ymin: 45.54354 xmax: -116.9161 ymax: 49.00244
Geodetic CRS: NAD83
Reading layer `urban_shape' from data source
`/Users/toyuan/24-manuscript/data/urban_shape.gpkg' using driver `GPKG'
Simple feature collection with 56 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -124.1802 ymin: 45.78668 xmax: -117.0418 ymax: 49.0021
Geodetic CRS: WGS 84
Reading layer `stations' from data source
`/Users/toyuan/24-manuscript/data/stations.gpkg' using driver `GPKG'
Simple feature collection with 2800 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -124.6629 ymin: 25.91606 xmax: -73.98274 ymax: 48.99526
Geodetic CRS: WGS 84
Reading layer `ct1202' from data source
`/Users/toyuan/24-manuscript/data/ct1202.gpkg' using driver `GPKG'
Simple feature collection with 1784 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -124.849 ymin: 45.54354 xmax: -116.9161 ymax: 49.00244
Geodetic CRS: WGS 84
Reading layer `ev1202-3' from data source
`/Users/toyuan/24-manuscript/data/ev1202-3.gpkg' using driver `GPKG'
Simple feature collection with 208002 features and 24 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -124.6274 ymin: 45.596 xmax: -117.0455 ymax: 48.99205
Geodetic CRS: WGS 84
Reading layer `zcta-tl' from data source
`/Users/toyuan/24-manuscript/data/zcta-tl.gpkg' using driver `GPKG'
Simple feature collection with 605 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -124.7365 ymin: 45.54354 xmax: -116.9161 ymax: 49.00244
Geodetic CRS: WGS 84
Reading layer `nov-ev' from data source
`/Users/toyuan/24-manuscript/data/nov-ev.gpkg' using driver `GPKG'
Simple feature collection with 17025 features and 15 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -124.6274 ymin: 30.41628 xmax: -71.06739 ymax: 48.99976
Geodetic CRS: WGS 84
Before presenting our evidence of clustering in the spatial pattern of EVs, we would like to start with some clarification about the “points” and their locations in our data. Each observation in the dataset represents an EV with its unique DOL id. However, the longitude and latitude information corresponds to the centroid of that Zip Code Tabulation Area associated with the recorded address of the EV owner. Thus, 208,002 distinct EVs are matched to only 548 distinct points.
From the frequency table of counties, we can see that EV population exists across all counties in WA. When plotting EV count per county as a statistic, a skewed distribution gives us the first evidence of the variance in EV adoption between counties.
A map visualization of the EV locations and counts provides us a spatially clustering impression. Furthermore, when looking at the map we tended to suggest seeing clusters and identify them with the three of the main metropolitan areas. For example, Seattle-Tacoma-Bellevue Metropolitan area to the north west clusters; one mid-western cluster to Spokane-Spokane Valley Metropolitan area; one approximately in the county Clark with Vancouver city to take part of the cross states Portland-Vancouver-Hillsboro Metropolitan area.
urb_map <-ggplot() +geom_sf(data = ct_sf, fill ="lightgray", color ="white", size =0.2) +geom_sf(data = ev_sf, aes(size = count), color ='cornflowerblue', alpha =0.5) +geom_sf(data = urban_poly, aes(fill ="Urban Areas"), color ="pink", alpha =0.3) +scale_size_continuous(name ="EV Registrations", range =c(1, 12)) +scale_fill_manual(name ="Legend", values =c("Urban Areas"="pink")) +labs(title ="Distribution of EV registrations in WA",subtitle ="urban areas" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =16),plot.subtitle =element_text(hjust =0.5, size =12) )ggsave("image/manu-urb-ev.png", plot = urb_map)urb_map
Using the definition provided by US Census Bureau, we can assign urban or rural labels to each registration. We learned from this classification that 180,490 registrations took place in urban areas, leaving 27,512 in rural areas. This spatial unevenness is evident when we impose the polygons of urban areas onto the EV population distribution.
Spatial Autocorrelation
In order to quantitatively measure the overall clusterness of our data and, regarding that clustering pattern is defined by having positive spatial autocorrelation, we go for the calculation the Global Moran’s I statistic.
global moran’s I
To estimate the autocorrelation value for our observations from Moran’s I formula, we have to specify our statistic of interest, a neighbor definition, and prepare a neighborhood relation and assign weights to neighbors.
We tried two ways to use spdep::moran for calculating Moran’s I. The first step here is to aggregate observed points into polygons, ZIP Code Tabulation Areas that contains 605 polygons and Census Tracts of 1784 polygons are two target of aggregation.1 When constructing the neighborhood relation both Rook and Queen defintions of neighbor are tried, and we use the default to assign equal weight to all neighbors. Both attempts are to point us to a postive autocorrelation of our observations, suggesting spatial clustering. To make sense of how significant our observed Moran’s I statistic is, we further did a permutation test using spdep::moran.mc() and acquired a pseudo p-value of 0.001 the most extreme value we can get in the case of 999 Monte Carlo Simulations, that is, the likelihood that our data generated from Complete Spatial Random process is a low one.
calculate the Moran’s I statistic sensitive to irregularly distributed polygons2
Moran’s I using ZCTAs as bounding polygons
[1] 0.6457255
a positive autocorrelation, significant?
Monte-Carlo simulation of Moran I
data: zcta_sf$n_ev
weights: zcta_listw
number of simulations + 1: 1000
statistic = 0.64146, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Moran’s I using Census Tracts as bounding polygons
Rook definition of neighbor
[1] 208002
Moran’s I value
[1] 0.5136527
visualize
create sf containing the points corresponding to the centroids of the census tracts
Monte-Carlo simulation of Moran I
data: ct_sf$n_ev
weights: ct_listw
number of simulations + 1: 1000
statistic = 0.51365, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
LISA: Local Moran’s I
With evidence of clustering globally, we also want information about where we can find the clusters. In this case, spdep::localmoran can help use with a Moran’s I statistic for each census tract. if a region i has a positive deviation from the mean and its neighbor also have high values to the values is going to be high. In our case, we are comparing the EV counts of the census tract i to the mean count and how i covariate with its neighbors.
[1] "localmoran" "matrix" "array"
[1] 1784 5
[1] 1784 26
ev_ct_local <- ct_sf7 |>ggplot() +geom_sf(aes(fill=as.numeric(local_i))) +labs(title ="Local moran's I by Census Tracts",fill ="local_i" ) +scale_fill_gradient2(low="darkblue", high="red",transform ="pseudo_log")# +scale_fill_viridis_c(na.value ="gray")# ggsave('image/ct_localmoran.png', ev_ct_local)ev_ct_local
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.28334 0.03044 0.19644 0.51365 0.46033 26.79675
To understand the first-order properties of EV adoption as a point pattern, we can look at our EV population distribution together with the human population distribution in WA. They appeared to follow very similar distribution, and we do recognize that the core of the definition of an urban area is the spatial distribution of population. Thus, it will be helpful to set aside other characteristics we associate with the urban label at the time, examine the relationship between the two populations in WA. We expect a better understanding of this relationship can guide us to extract the parts of EV adoption pattern, explainable by population density and unexplainable that how the effectiveness of explaining EV adoption in terms of population density migh varies from one urban area to another.
distribution_map <-ggplot() +geom_sf(data = ct_sf, fill ="lightgray", color ="white", size =0.2) +geom_sf(data = ev_sf,aes(size = count,color = cb_palette[5]),alpha =0.6) +scale_size_continuous(name ="EV Registrations", range =c(0.1, 15)) +labs(title ="Distribution of ZEVs in WA",subtitle ="Cumulative data to October 31, 2024",x ="longitude",y ="latitude" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14),plot.subtitle =element_text(hjust =0.5, size =11) ) +guides(color ="none")ggsave("image/manu-ev-distribution.png", plot = distribution_map)distribution_map
Our null hypothesis is that people in WA have equal chance to adopt EV regardless of where they live, rural or urban.
First order properties of point process
Intensity Function
One important limitation of this analysis that needs to be addressed is that we use the WA state boundaries as our observation window. The drawback of this choice is quite obvious: among the three major metropolitan areas in WA, two are situated at the borders of the state – one with Oregon and the other with Idaho. This can lead to underestimations, as they can have less neighbors as the result of this bounding box.4
Intensity function we are interested here is the count of event , count of EVs and count of people in an area
Constructing the ppp object for estimating EV population intensity is a more straightforward one, we use the 548 distinct points and marked them with EV registration counts. When it comes to finding an appropriate way to estimate human population intensity, since ZCTAs are not a typical statistical division of US Census Bureau, thus, one way to approach this problem is to use the data of population counts by census tract and use the centroids of the census tracts to construct ppp object. Another way we explored is to take the 605 centroids of ZCTAs and request points estimate of population counts also from NASA APPEEARs.
zev_ppp
how the kernel density function smoothed the marked ppp?
That is, the spatial distribution of population alone can not offer a comprehensive explanation of the EV spatial pattern, can be decent but not overall satisfying
urb_station_map <-ggplot() +geom_sf(data = ct_sf, fill ="lightgray", color ="white", size =0.2) +geom_sf(data = wa_stations_sf, color = cb_palette[3], alpha =0.6) +geom_sf(data =to_4326(urban_poly), aes(fill ="Urban Areas"), color ="pink", alpha =0.3) +scale_fill_manual(name ="Legend", values =c("Urban Areas"="pink")) +labs(title ="Distribution of Public EV Charging Stations in WA",subtitle ="urban areas" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =16),plot.subtitle =element_text(hjust =0.5, size =12) )ggsave("image/manu-urb-stations.png", plot = urb_station_map)urb_station_map# 4326
station_ppp
[1] 2647
relative_int(station_ppp, zev_ppp)
positive and significant, clustering of like values
a computational approach.
Low Medium High
70 332 2245
Low
Medium
High
67
250
2345
55
263
2344
61
274
2327
57
261
2344
[1] 1000 3
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000000 0.000000 0.000000 0.007889 0.000000 10.000000
References
International Earth Science Information Network (CIESIN) - Columbia University, Center for. 2018. “Gridded Population of the World, Version 4 (GPWv4): Population Count Adjusted to Match 2015 Revision of UN WPP Country Totals, Revision 11.” Palisades, NY: NASA Socioeconomic Data; Applications Center (SEDAC). https://doi.org/10.7927/H4PN93PB.
Footnotes
the reason that I did not simply use ZCTAs is because though the codebook provided describe the location variable as the center of postal code areas, but the longitude and latitude provided did not align well with the centroids of ZCTAs from tiger/line shapefile, with cases that more than one points fall within a ZCTA↩︎
Center for International Earth Science Information Network - CIESIN - Columbia University (2018). Gridded Population of the World, Version 4 (GPWv4): Population Count Adjusted to Match 2015 Revision of UN WPP Country Totals, Revision 11. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). doi: 10.7927/H4PN93PB. Accessed December 4, 2024.↩︎
these peripheral areas might be affected by artificial lines?↩︎